Line-Level Layout Recognition of Historical Documents with Background Knowledge
نویسندگان
چکیده
Digitization and transcription of historic documents offer new research opportunities for humanists are the topics many edition projects. However, manual work is still required main phases layout recognition subsequent optical character (OCR) early printed documents. This paper describes evaluates how deep learning approaches recognize text lines can be extended to using background knowledge. The evaluation was performed on five corpora prints from 15th 16th Centuries, representing a variety features. While with standard layouts could recognized in correct reading order precision recall up 99.9%, also complex were at rate as high 90% by knowledge, full potential which revealed if pages same source transcribed.
منابع مشابه
Text Line Extraction from Complex Layout Documents
There are numerous stylish documents which do not have the traditional text layouts where printed text regions are not parallel to each other. Such complex layouts make text line extraction challenging due to multi-orientation of paragraphs. This paper introduces a system for the text line extraction from the complex layout documents. Proposed method is based on the concept of dilation and hist...
متن کاملStructure Recognition of Table-Form Documents on the Basis of the Automatic Acqusition of Layout Knowledge
متن کامل
Handwritten Text Recognition for Historical Documents
The amount of digitized legacy documents has been rising dramatically over the last years due mainly to the increasing number of on-line digital libraries publishing this kind of documents. The vast majority of them remain waiting to be transcribed into a textual electronic format (such as ASCII or PDF) that would provide historians and other researchers new ways of indexing, consulting and que...
متن کاملIntegrating Optical Character Recognition and Machine Translation of Historical Documents
Machine Translation (MT) plays a critical role in expanding capacity in the translation industry. However, many valuable documents, including digital documents, are encoded in non-accessible formats for machine processing (e.g., Historical or Legal documents). Such documents must be passed through a process of Optical Character Recognition (OCR) to render the text suitable for MT. No matter how...
متن کاملClassification and Recognition of Neume Note Notation in Historical Documents
Neume musical notation is a type of writing of Christian Orthodox Church chant originated in Ancient Byzantium, which is still used by the Orthodox Church. The big varieties of preserved historical documents, which contain neumes give reach material for investigations in fields like history, cultural sciences, etc. This causes the natural need of a software package which can help these investig...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Algorithms
سال: 2023
ISSN: ['1999-4893']
DOI: https://doi.org/10.3390/a16030136